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PATENT 



rK j TH ^ ""TED P TA iTF? ?fr rpKT AND TRADEMARK OFFICE 



In ire Application of: 

ANITA WAT-LING HUANG, et al. 

Serial No.: 09/513,058 

Filed: February 24, 2000 

Far: : SYSTEM AND METHOD FOR 

CLASSIFYING ELECTRONICALLY 

POSTED DOCUMENTS 



Group Art Unit: 2178 
Examiner: A. Baschoar 



: We, the undersigned, are the Applicants for ihe above-identified patent application and 
hereby declare the following: 

'. h The pending claims of our above-identified patent application were 
rejected under 35 U.S.C § 103(a) based on U.S. Patent No. 
5 9 13.208 to Brown et al., which is entitled 'Identifying Duplicate 
Documents from Search Result* Without Comparing Document 
Content" and issued on June 15, 199° ("Brown"). 

: 2) The invention claimed in the above-identified patent application 
was reduced to writing in the United States pjjflrJffi *e June 15, 
1999 issue date of the Brown reference. Attached hereto is the 
relevant portion of an Invention Disclosure on which the above- 
identified patent application was based. Thia Invention Disclosure 
was prepared BriflUtQ. June 15, 1999. 

: We, the undersigned, hereby declare that all statements made he^ of^ own 
knowledge are true and that all statements made on information and belief we : beheved to be 
true: and farther that these statements were made with the knowledge that wiUul fills* 
sfcSments and the like so made are punishable by fine or impnsoiunent, or both ^under 1 8 
IJ.S.C. § 1001 and that such willful false statements may jeopardize the validity ot the 
application or any patent issued thereon. 

Name: Anita Wai-Ling Huang* «— *— DitB '- 




=• — — j*oulcw*^z„ mite ? TM0 4 
Name: Neelakantat Sundaresan Signature^^gag^ — 

♦Unavailable for signature under MPEP § 71 5.04: no longer employed by IBM and unreachable 
at l*st known mailing addresses, email addresses, and phone numbers. 
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A System end 



Mothod Based or IM— «. *>r Eliminating Duplicate* 



lr Result SAW in Soarch Engines 




[Disclosure ARC8-1 999-0080 

Created ByrNeetSondaresan Creat * °™ d|fied ! 
Last Modified By: Neel Sundaresan Last Modified 

»»« IBM Confi dential *** i |M *>« «°™" 

fteq Uirw l fields are rrwrtceo t" 8 astenSK l ) ano 



Summary 



Inventors with Lotus Notes ID'* ... ^_,| nM 

Inventors: Anita Huarig/Almaden/IBM, Nee! Sunderesan/Almaden/IBM 



Inventors without Lotus Notes IP's 
IDT Selection 




trwontor Nam* 


Inventor 
Serial 


DW/Dapt 


Serial 


Manager Mama . 


> denote arimarv contact . 











Main Idea , * ... -.-ix "• ' * ' ,: - *• 

Search Engines . 

1 Describe your Invention, stating the problem aolved (If appropriate), and indicating the 
popular web doouments ara mirrored at several web sites. 
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' — — -<~-»~~>~- "™ 

MWinaftm .tout *. jo™ ^™ ™ TO. .dvanwa. of keoplng it In tuen • torn » 

data to*. All of the*. .» koot |n XMUHDP torn. ™ o ^ , h| „ wlw . Th . 

«*, „. M* «~ctural tola W.x «aln. which sw . »«oh 

way duplicates are eliminated. 
, n order to avo,d 

automatically eliminate those that j "^^^ ™" ^ it „ a Ja va program. Obviously this 
metadata for a Java program will conta.r a tafi ^ I JJ^^ a t0 an xmL data or a C + + 
piece metadata wiil not be compared to one that corresponds w 

program. 

2 How doe* the invention solve the prob.em or achieve an advantage,* description of 'the 
invention - , including figures inline as appropriate)? 



Block Diagrams: 



Page 2 
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, System and Method Basad on M^ta for EHmtatli* 



Duplteates In R6*utt Sats in Int^t Searo* Engi^s 





ALGORITHM 
TO DETECT AND 
CONSOLIDATE 
DUPLICATES 



A. UNOKPKRJSD 
METADATA 
REPOSITORY 



"1 




Figure 1 

1 The crawlers surnxnanz* data torn the World Wide Web (WWW), in RDF format. 
sSSvmnmriw in a metadata repository (A). Tte repository cchum Asm*. 

2 SySe metadata in (A), taking advantage of to*"**", 
to .Sated datJotring to process, it consolidates gnwps * ^"^j™^" 
(SdiZct URU)^^gl C record. Each consolidated rword afco t« a k* of 
the duplicating URLs. Hie result is an ordered metadata repository (B). 

3 The seaicfc engine indexes and quarto the ^metadata repository (B). As i aresuK, * 
rtfumfa siagllrocord for each set of duplicated instances rather than sepaiate records 
for each. 
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Sample 



Metadata Summaries of Identical HTML Source FUes 



atjC^ummartee^heSynwiflryMaker 
abc:mlm«-typc=^http/htmr 

abc-.ttmHrt!©-'' Dupficate Example 
abcihtroJ-encodiTig^aass if 
abttfitetrect*" Duplicate Example! 

nJf:HREF^> 
<abc;raf-*nnotation6> 

<r<tf;BBg> 

annot«t«e-"http://foot>af.comr 
annotation^- POO"/> 

</abt:ref-annotatk>n*> 
<a&c:presentatlon-twt> 

'^UVWelcorne to the FOOBAR E*amptol«A*:U> 

</abc:pre3$mafioirtex&' 
</rff:De$ciipttan> 
<frdf.RDF> 

<tebe:presEmtation-tocl> 
</rdftDwrtptlon> 

<lrdf:RDF> 



*?xml version^ 1 ! ^*? > 

<rctf;Oasotipiion 

»^u^ 3 17.3*40 G «t ; *>9" 

abc»«in»**wnofl«etf«Tnot Kncwnr 
ebo:mime-type D "MttpAitm r 
ebcsgurce-fe^GoocT 
aboxommentfi-'QOcnl /> 

abahttnMHfr' Duplicate Example 
a6c:htmi-eftcoding«^8858 * 1 
abftiatostraot^ Dupficate Example! 
rdtHREF="> 
<abo:r0f-anrotetJons> 
<rtf;Bag» 

<rdf:Deserlptjpn 

annotat©8= B http:^foobar.com/" 

annotations".- FOO B /> 
<*ttf:U> 
</abc:raf"annCtatlorw> 
<abo;presente!lor*-text> 

^ntt^Weieome to the FOOBAR E=*flmptel</rdf:LP. 
</rdf:Bag> 

<:/at>c:praserrt2rttarvtext> 

<frdf.Da9orlptlcn> 

<WftRDF* 

<yabo^>re&enta11on-toxt> 

«s/rdl:Dflftcrtption> 



Figure 1 

duplicating URlA without cluttering the search space. 



Our method is better in that it doe, structure comparison. Instead o< doing textua. comparison it 
tiZSSXSfiS* of MM obvious dopants that shou.d not be compared based 
upon some key attribute values {like file name extens»on 8 etc.) 

3 ,f the aame adverse or prob.em ha* been identified by others (Inside/outside IBM), ho* have 
Z JsTo^Zo^c ^ does your solution differ and why is It better? 



Peg* 4 
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■ h Method Basad on Mo^^ata for Eliminating Duplicates In Result Se» in In*. «rt-SsafCh Engine 
A System and l« flWW 

«a%T^" ound thjMhaV method fop e|lminating duplicateS . Same with .nfoaeek and hotbot. 
did not have e tuw. k 

^ntlon i. implemented in a product or prototype, include technical detai.s, purpose. 
4 . „ the and the dete of that implementation, 

disclosure d*W» CenM , Famj , y of $sarch er , gine$ an d ports*. 

Being incorporateo 
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